Answer the following questions:
In this activity, you will:
O’Sullivan D and Unwin D (2010) Geographic Information Analysis, 2nd Edition, Chapters 8 and 9. John Wiley & Sons: New Jersey.
For this activity you will need the following:
This R markdown notebook.
A file called “Wolfcamp Aquifer.RData”
The data is a set of piezometric head (watertable pressure) observations of the Wolfcamp Aquifer in Texas (https://en.wikipedia.org/wiki/Hydraulic_head). Measures of pressure can be used to infer the flow of underground water, since water flows from high to low pressure areas.
These data were collected to evaluate potential flow of contamination related to a high level toxic waste repository in Texas. The Deaf Smith county site in Texas was one of three potential sites proposed for this repository. Beneath the site is a deep brine aquifer known as the Wolfcamp aquifer that may serve as a conduit of contamination leaking from the repository.
The data set consists of 85 georeferenced measurements of piezometric head. Possible applications of interpolation are to determine sites at risk and to quantify uncertainty of the interpolated surface, to evaluate the best locations for monitoring stations.
It is good practice to clear the working space to make sure that you do not have extraneous items there when you begin your work. The command in R to clear the workspace is rm (for “remove”), followed by a list of items to be removed. To clear the workspace from all objects, do the following:
rm(list = ls())
Note that ls() lists all objects currently on the worspace.
Load the libraries you will use in this activity (load other packages as appropriate).
library(tidyverse)
library(plotly)
Attaching package: <U+393C><U+3E31>plotly<U+393C><U+3E32>
The following object is masked from <U+393C><U+3E31>package:ggplot2<U+393C><U+3E32>:
last_plot
The following object is masked from <U+393C><U+3E31>package:stats<U+393C><U+3E32>:
filter
The following object is masked from <U+393C><U+3E31>package:graphics<U+393C><U+3E32>:
layout
library(spdep)
Load dataset:
load("Wolfcamp Aquifer.RData")
Add polynomial terms:
aquifer <- mutate(aquifer, X3 = X^3, X2Y = X^2 * Y, X2 = X^2,
XY = X * Y,
Y2 = Y^2, XY2 = X * Y^2, Y3 = Y^3)
Linear:
trend1 <- lm(formula = H ~ X + Y, data = aquifer)
summary(trend1)
Call:
lm(formula = H ~ X + Y, data = aquifer)
Residuals:
Min 1Q Median 3Q Max
-367.00 -161.41 -30.74 148.23 651.17
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2591.3266 38.9636 66.51 <2e-16 ***
X -6.7519 0.3439 -19.64 <2e-16 ***
Y -5.9862 0.4066 -14.72 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 203.3 on 82 degrees of freedom
Multiple R-squared: 0.892, Adjusted R-squared: 0.8894
F-statistic: 338.8 on 2 and 82 DF, p-value: < 2.2e-16
Quadratic:
trend2 <- lm(formula = H ~ X^2 + X + XY + Y + Y^2, data = aquifer)
summary(trend2)
Call:
lm(formula = H ~ X^2 + X + XY + Y + Y^2, data = aquifer)
Residuals:
Min 1Q Median 3Q Max
-406.31 -138.87 -13.05 129.29 722.49
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.627e+03 3.833e+01 68.537 < 2e-16 ***
X -8.288e+00 5.659e-01 -14.646 < 2e-16 ***
XY 2.453e-02 7.402e-03 3.314 0.00137 **
Y -6.648e+00 4.327e-01 -15.363 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 192 on 81 degrees of freedom
Multiple R-squared: 0.9049, Adjusted R-squared: 0.9014
F-statistic: 257 on 3 and 81 DF, p-value: < 2.2e-16
Cubic:
trend3 <- lm(formula = H ~ X^3 + X2Y + X2 + X + XY + Y + Y2 + XY2 + Y3, data = aquifer)
summary(trend3)
Call:
lm(formula = H ~ X^3 + X2Y + X2 + X + XY + Y + Y2 + XY2 + Y3,
data = aquifer)
Residuals:
Min 1Q Median 3Q Max
-394.66 -132.51 -1.24 97.80 497.00
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.477e+03 1.120e+02 22.115 < 2e-16 ***
X -8.432e+00 9.381e-01 -8.989 1.4e-13 ***
X2Y 3.870e-04 1.540e-04 2.513 0.0141 *
X2 -1.868e-02 9.065e-03 -2.061 0.0427 *
XY 3.448e-02 2.766e-02 1.247 0.2164
Y 1.460e+00 4.862e+00 0.300 0.7647
Y2 -9.601e-02 5.791e-02 -1.658 0.1015
XY2 -1.162e-04 1.614e-04 -0.720 0.4738
Y3 2.905e-04 2.012e-04 1.444 0.1529
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 179.6 on 76 degrees of freedom
Multiple R-squared: 0.9219, Adjusted R-squared: 0.9137
F-statistic: 112.1 on 8 and 76 DF, p-value: < 2.2e-16
Based on this, the most likely choice is the cubic trend model.
predict to interpolate the field using your chosen model. Plot the interpolated field using a method of your choice (e.g., ggplot2, 3D plot, etc.)Find the domain of the coordinates:
summary(aquifer[,1:2])
X Y
Min. :-145.24 Min. : 9.414
1st Qu.: -21.30 1st Qu.: 33.682
Median : 11.66 Median : 59.158
Mean : 16.89 Mean : 79.356
3rd Qu.: 70.90 3rd Qu.:131.825
Max. : 112.80 Max. :184.766
Coordinates for prediction:
X.p <- seq(-140, to = 110, by = 5)
Y.p <- seq(10, to = 180, by = 5)
Use these to create interpolation grid:
df.p <- expand.grid(X = X.p, Y = Y.p)
Calculate polynomial terms:
df.p <- mutate(df.p, X3 = X^3, X2Y = X^2 * Y, X2 = X^2,
XY = X * Y,
Y2 = Y^2, XY2 = X * Y^2, Y3 = Y^3)
Obtain predictions:
preds <- predict(trend3, newdata = df.p, se.fit = TRUE, interval = "prediction", level = 0.95)
Append the predictions, intervals, standard errors to the prediction dataframe:
df.p <- data.frame(df.p, z.p = preds$fit[,1], lwr = preds$fit[,2], upr = preds$fit[,3], se.fit = preds$se.fit)
Plot the trend surface:
ggplot(data = aquifer, aes(x = X, y = Y)) +
geom_tile(data = df.p, aes(x = X, y= Y, fill = z.p)) +
scale_fill_distiller(palette = "YlOrRd", trans = "reverse") +
geom_point(aes(color = H)) +
scale_color_distiller(palette = "YlOrRd", trans = "reverse") +
geom_point(shape = 1, size = 2) +
coord_equal()
As an alternative:
Convert predictions to matrix for plotting in plotly:
z.p <- matrix(preds$fit[,1], nrow = 35, ncol = 51, byrow = TRUE)
Plot:
plot_ly(x = ~X.p, y = ~Y.p, z = ~z.p, type = "surface") %>%
add_markers(data = aquifer, x = ~X, y = ~Y, z = ~H)
predict).The simplest way to do this is by looking at the descriptive statistics of the fit:
summary(preds$fit)
fit lwr upr
Min. :1147 Min. : 757.9 Min. :1536
1st Qu.:1644 1st Qu.:1268.4 1st Qu.:2021
Median :2102 Median :1703.6 Median :2488
Mean :2164 Mean :1751.9 Mean :2577
3rd Qu.:2676 3rd Qu.:2212.6 3rd Qu.:3102
Max. :3354 Max. :2946.3 Max. :4388
The range can be plotted as follows. First append the fit and confidence interval to dataframe:
df.p <- data.frame(df.p, z.p = preds$fit[,1], lwr = preds$fit[,2], upr = preds$fit[,3])
Then plot:
ggplot(data = df.p, aes(x = X, y = Y, fill = lwr)) +
geom_tile() +
scale_fill_distiller(palette = "YlOrRd", trans = "reverse") +
coord_equal()
ggplot(data = df.p, aes(x = X, y = Y, fill = upr)) +
geom_tile() +
scale_fill_distiller(palette = "YlOrRd", trans = "reverse") +
coord_equal()
Or in plotly:
z.p_l <- matrix(data = preds$fit[,2], nrow = 35, ncol = 51, byrow = TRUE)
z.p_u <- matrix(data = preds$fit[,3], nrow = 35, ncol = 51, byrow = TRUE)
Plot:
plot_ly(x = ~X.p, y = ~Y.p, z = ~z.p, type = "surface", colors = "YlOrRd") %>%
add_surface(x = ~X.p, y = ~Y.p, z = ~z.p_l, opacity = 0.5, showscale = FALSE) %>%
add_surface(x = ~X.p, y = ~Y.p, z = ~z.p_u, opacity = 0.5, showscale = FALSE) %>%
add_markers(data = aquifer, x = ~X, y = ~Y, z = ~H)
Label the residuals:
aquifer$residuals <- ifelse(trend3$residuals > 0, "positive", "negative")
Plot:
ggplot(data = aquifer, aes(x = X, y = Y, color = residuals)) +
geom_point() +
coord_equal()
The plot suggests pockets of positive and negative residuals. Conduct test with Moran’s I.
First create spatial weights:
w <- knearneigh(as.matrix(aquifer[,1:2]), k = 5) %>% knn2nb() %>% nb2listw()
Moran’s test:
moran.test(trend3$residuals, w)
Moran I test under randomisation
data: trend3$residuals
weights: w
Moran I statistic standard deviate = 2.5826, p-value = 0.004902
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.145630475 -0.011904762 0.003720749
The test fails to reject the null hypothesis, which means that there is residual pattern. If you were asked to guess the value of the residual at location [-25,100], would you say it was most likely to be positive or negative? How can we use this information to improve the quality of our predictions?